Introduction

In our modern world, utilizing data to address societal issues has become increasingly important. This report digs into a data-driven approach aimed at enhancing public safety in Colchester, utilizing meteorological data and crime incident reports. By analyzing patterns and correlations between crime incidents and weather conditions, it aims to provide information that could potentially inform proactive measures for improving community safety. By finding out the risk of crime and identifying frequent crime occurrences, authorities can better allocate resources and implement safety measures, particularly in high-risk areas or specific streets. Additionally, understanding the outcome status of cases or incidents can help police prioritize their efforts, focusing on important cases rather than those that may not be suspected of significant impact on public safety.

getwd()
## [1] "/Users/nithyashree/Downloads"
setwd("/Users/nithyashree/Documents/Data Visualization")
#read the data
crime420co4 <- read.csv("crime23.csv")
temp180co4  <- read.csv("temp2023.csv")
library(stringr)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(DT)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(plotly)
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(ggplot2)
library(viridis)
## Loading required package: viridisLite
library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:plotly':
## 
##     select
## The following object is masked from 'package:dplyr':
## 
##     select
library(leaflet)
# View the structure of the data frames
str(crime420co4)
## 'data.frame':    6878 obs. of  12 variables:
##  $ category        : chr  "anti-social-behaviour" "anti-social-behaviour" "anti-social-behaviour" "anti-social-behaviour" ...
##  $ persistent_id   : chr  "" "" "" "" ...
##  $ date            : chr  "2023-01" "2023-01" "2023-01" "2023-01" ...
##  $ lat             : num  51.9 51.9 51.9 51.9 51.9 ...
##  $ long            : num  0.909 0.902 0.898 0.902 0.895 ...
##  $ street_id       : int  2153366 2153173 2153077 2153186 2153012 2153379 2153105 2153541 2152937 2153107 ...
##  $ street_name     : chr  "On or near Military Road" "On or near " "On or near Culver Street West" "On or near Ryegate Road" ...
##  $ context         : logi  NA NA NA NA NA NA ...
##  $ id              : int  107596596 107596646 107595950 107595953 107595979 107595985 107596603 107596291 107596305 107596453 ...
##  $ location_type   : chr  "Force" "Force" "Force" "Force" ...
##  $ location_subtype: chr  "" "" "" "" ...
##  $ outcome_status  : chr  NA NA NA NA ...
str(temp180co4)
## 'data.frame':    365 obs. of  18 variables:
##  $ station_ID     : int  3590 3590 3590 3590 3590 3590 3590 3590 3590 3590 ...
##  $ Date           : chr  "2023-12-31" "2023-12-30" "2023-12-29" "2023-12-28" ...
##  $ TemperatureCAvg: num  8.7 6.6 9.9 9.9 5.8 9.8 12.5 10 9.6 10 ...
##  $ TemperatureCMax: num  10.6 9.7 11.4 11.5 10.6 12.7 14.3 12 10.8 12.6 ...
##  $ TemperatureCMin: num  4.4 4.4 6.9 4 3.9 6.3 9.5 8.4 8.1 8.1 ...
##  $ TdAvgC         : num  7.2 4.2 6 7.5 3.7 7.6 10.1 7 6.5 6.2 ...
##  $ HrAvg          : num  89.6 85.5 77.2 84.6 86.4 86.9 85.3 81.5 81.2 78.2 ...
##  $ WindkmhDir     : chr  "S" "WSW" "SW" "SSW" ...
##  $ WindkmhInt     : num  25 22.7 32.8 32.2 13.2 23.5 34.1 32.7 34.1 37.5 ...
##  $ WindkmhGust    : num  63 50 61.2 70.4 37.1 46.3 72.3 61.2 68.6 77.8 ...
##  $ PresslevHp     : num  999 1007 1004 1003 1016 ...
##  $ Precmm         : num  6.2 0.4 0.8 2.8 2 4.4 0.8 0.8 0 2 ...
##  $ TotClOct       : num  8 4.6 6.5 6.8 4 6.5 7.8 5 8 7.5 ...
##  $ lowClOct       : num  8 6.5 6.7 7.1 6.9 7.4 7.8 6.7 8 7.5 ...
##  $ SunD1h         : num  0 1.1 0.1 0 3.2 0 0 2.9 0 1.4 ...
##  $ VisKm          : num  26.3 48.3 26.7 25.1 30.1 45.8 61.8 72.9 69.4 34.3 ...
##  $ PreselevHp     : logi  NA NA NA NA NA NA ...
##  $ SnowDepcm      : int  NA NA NA NA NA NA NA NA NA NA ...
# Check for missing values
sum(is.na(crime420co4))
## [1] 7555
sum(is.na(temp180co4))
## [1] 851

Data Preparation

Data preparation is a critical step in the analytical process, essential for ensuring the integrity and reliability of data sources. Before embarking on the analytical journey, one must ensure the integrity and reliability of the data sources.

Crime Data Cleaning

The crime dataset, initially in a raw state, underwent a accurate cleaning process to address missing values, standardize formats, and remove irrelevant information. Numeric columns were treated with care, ensuring that missing values were replaced with appropriate measures of central tendency. Textual data, such as street names, was converted to a consistent format, helpful in accurate analysis and visualization.

# Create a new variable for the cleaned dataset
cleaned_crimeco4 <- crime420co4

#Get list of numeric columns
num_co4_col <- sapply(cleaned_crimeco4, is.numeric)

# Replace NA values in numeric columns with mean
cleaned_crimeco4[num_co4_col] <- lapply(cleaned_crimeco4[num_co4_col], function(x) {
    ifelse(is.na(x), round(mean(x, na.rm = TRUE), 1), x)
})

# Data cleaning for cleaned_crimeco4
# Fill missing values in outcome_status
cleaned_crimeco4$outcome_status[is.na(cleaned_crimeco4$outcome_status)] <- "No Information"

# Clean street names in crime data
cleaned_crimeco4$street_name <- str_trim(str_to_lower(cleaned_crimeco4$street_name))

# Parse the date column in the cleaned_crimeco4 dataset
cleaned_crimeco4$date <- ym(cleaned_crimeco4$date)

# Remove irrelevant columns (context, location_subtype)
cleaned_crimeco4 <- subset(cleaned_crimeco4, select = -c(context, location_subtype))

head(cleaned_crimeco4)
##                category persistent_id       date      lat     long street_id
## 1 anti-social-behaviour               2023-01-01 51.88306 0.909136   2153366
## 2 anti-social-behaviour               2023-01-01 51.90124 0.901681   2153173
## 3 anti-social-behaviour               2023-01-01 51.88907 0.897722   2153077
## 4 anti-social-behaviour               2023-01-01 51.89122 0.901988   2153186
## 5 anti-social-behaviour               2023-01-01 51.89416 0.895433   2153012
## 6 anti-social-behaviour               2023-01-01 51.88050 0.909014   2153379
##                     street_name        id location_type outcome_status
## 1      on or near military road 107596596         Force No Information
## 2                    on or near 107596646         Force No Information
## 3 on or near culver street west 107595950         Force No Information
## 4       on or near ryegate road 107595953         Force No Information
## 5       on or near market close 107595979         Force No Information
## 6         on or near lisle road 107595985         Force No Information

Climate Data Cleaning

Similar to the crime data cleaning process, the weather dataset underwent similar transformations to ensure consistency and completeness. Numeric columns were checked for missing values, which were filled in with mean values to keep the data reliable. Irrelevant variables were spotted and taken out, making the dataset simpler for analysis.

#cleaned dataset name
cleaned_tempco4 <- temp180co4

# List of numeric columns in temp180co4
num_col_tempco4 <- sapply(cleaned_tempco4, is.numeric)

# Replace NA values in numeric columns with mean, preserving original precision
cleaned_tempco4[num_col_tempco4] <- lapply(cleaned_tempco4[num_col_tempco4], function(x) {
  ifelse(is.na(x), round(mean(x, na.rm = TRUE), 1), x)  
})



# Parse the Date column in temp180co4 dataset
cleaned_tempco4$Date <- ymd(cleaned_tempco4$Date)

# Remove irrelevant columns (PreselevHp, SnowDepcm)
cleaned_tempco4 <- cleaned_tempco4[, !names(cleaned_tempco4) %in% c("PreselevHp", "SnowDepcm")]

head(cleaned_tempco4)
##   station_ID       Date TemperatureCAvg TemperatureCMax TemperatureCMin TdAvgC
## 1       3590 2023-12-31             8.7            10.6             4.4    7.2
## 2       3590 2023-12-30             6.6             9.7             4.4    4.2
## 3       3590 2023-12-29             9.9            11.4             6.9    6.0
## 4       3590 2023-12-28             9.9            11.5             4.0    7.5
## 5       3590 2023-12-27             5.8            10.6             3.9    3.7
## 6       3590 2023-12-26             9.8            12.7             6.3    7.6
##   HrAvg WindkmhDir WindkmhInt WindkmhGust PresslevHp Precmm TotClOct lowClOct
## 1  89.6          S       25.0        63.0      999.0    6.2      8.0      8.0
## 2  85.5        WSW       22.7        50.0     1006.9    0.4      4.6      6.5
## 3  77.2         SW       32.8        61.2     1003.6    0.8      6.5      6.7
## 4  84.6        SSW       32.2        70.4     1003.2    2.8      6.8      7.1
## 5  86.4         SW       13.2        37.1     1016.4    2.0      4.0      6.9
## 6  86.9        WSW       23.5        46.3     1006.2    4.4      6.5      7.4
##   SunD1h VisKm
## 1    0.0  26.3
## 2    1.1  48.3
## 3    0.1  26.7
## 4    0.0  25.1
## 5    3.2  30.1
## 6    0.0  45.8
# View the structure of the data frames
str(cleaned_tempco4)
## 'data.frame':    365 obs. of  16 variables:
##  $ station_ID     : int  3590 3590 3590 3590 3590 3590 3590 3590 3590 3590 ...
##  $ Date           : Date, format: "2023-12-31" "2023-12-30" ...
##  $ TemperatureCAvg: num  8.7 6.6 9.9 9.9 5.8 9.8 12.5 10 9.6 10 ...
##  $ TemperatureCMax: num  10.6 9.7 11.4 11.5 10.6 12.7 14.3 12 10.8 12.6 ...
##  $ TemperatureCMin: num  4.4 4.4 6.9 4 3.9 6.3 9.5 8.4 8.1 8.1 ...
##  $ TdAvgC         : num  7.2 4.2 6 7.5 3.7 7.6 10.1 7 6.5 6.2 ...
##  $ HrAvg          : num  89.6 85.5 77.2 84.6 86.4 86.9 85.3 81.5 81.2 78.2 ...
##  $ WindkmhDir     : chr  "S" "WSW" "SW" "SSW" ...
##  $ WindkmhInt     : num  25 22.7 32.8 32.2 13.2 23.5 34.1 32.7 34.1 37.5 ...
##  $ WindkmhGust    : num  63 50 61.2 70.4 37.1 46.3 72.3 61.2 68.6 77.8 ...
##  $ PresslevHp     : num  999 1007 1004 1003 1016 ...
##  $ Precmm         : num  6.2 0.4 0.8 2.8 2 4.4 0.8 0.8 0 2 ...
##  $ TotClOct       : num  8 4.6 6.5 6.8 4 6.5 7.8 5 8 7.5 ...
##  $ lowClOct       : num  8 6.5 6.7 7.1 6.9 7.4 7.8 6.7 8 7.5 ...
##  $ SunD1h         : num  0 1.1 0.1 0 3.2 0 0 2.9 0 1.4 ...
##  $ VisKm          : num  26.3 48.3 26.7 25.1 30.1 45.8 61.8 72.9 69.4 34.3 ...
str(cleaned_crimeco4)
## 'data.frame':    6878 obs. of  10 variables:
##  $ category      : chr  "anti-social-behaviour" "anti-social-behaviour" "anti-social-behaviour" "anti-social-behaviour" ...
##  $ persistent_id : chr  "" "" "" "" ...
##  $ date          : Date, format: "2023-01-01" "2023-01-01" ...
##  $ lat           : num  51.9 51.9 51.9 51.9 51.9 ...
##  $ long          : num  0.909 0.902 0.898 0.902 0.895 ...
##  $ street_id     : int  2153366 2153173 2153077 2153186 2153012 2153379 2153105 2153541 2152937 2153107 ...
##  $ street_name   : chr  "on or near military road" "on or near" "on or near culver street west" "on or near ryegate road" ...
##  $ id            : int  107596596 107596646 107595950 107595953 107595979 107595985 107596603 107596291 107596305 107596453 ...
##  $ location_type : chr  "Force" "Force" "Force" "Force" ...
##  $ outcome_status: chr  "No Information" "No Information" "No Information" "No Information" ...
# Check for missing values
sum(is.na(cleaned_tempco4))
## [1] 0
sum(is.na(cleaned_crimeco4))
## [1] 0

Analysis of Violent Crime Incidents by Season

# Function to categorize dates into seasons
get_sew <- function(date) {
  month <- month(date)
  if (month %in% 3:5) {
    return("Spring")
  } else if (month %in% 6:8) {
    return("Summer")
  } else if (month %in% 9:11) {
    return("Fall")
  } else {
    return("Winter")
  }
}

# Apply the function to categorize dates into seasons
vc_co4 <- vc_co4 %>%
  mutate(season = factor(sapply(date, get_sew)))

# Aggregate violent crime incidents by season
vc_seson <- vc_co4 %>%
  group_by(season) %>%
  summarise(vc_seson  = n()) 

# Define custom colors for each season
sen_colos <- c(Spring = "#FFD700", Summer = "#32CD32", Fall = "#FF4500", Winter = "#4169E1")

# Apply the custom colors to the bar plot
vil_plt <- plot_ly(data = vc_seson, x = ~season, y = ~vc_seson, type = "bar", marker = list(color = sen_colos)) %>%
  layout(title = "Violent Crime Incidents by Season",
         xaxis = list(title = "Season"),
         yaxis = list(title = "Violent Crime Count"))

# Display the interactive plot
vil_plt

The interactive bar plot illustrates the distribution of violent crime incidents across different seasons in Colchester. Each bar represents a season, with colors indicating Spring, Summer, Fall, and Winter. Notably, the plot reveals that the highest number of violent crimes occurred during the Fall season, with a count of 693 incidents. This information can guide law enforcement agencies in implementing targeted strategies to address crime during specific seasons of the year.

Understanding Crime Hotspots in Colchester

# Filtered crime data with latitude and longitude columns
cr_dtsm <- cleaned_crimeco4 %>%
  filter(!is.na(lat) & !is.na(long))

# Calculate the 2D kernel density estimation of crime incidents
den_we <- kde2d(cr_dtsm$long, cr_dtsm$lat)


# Create a point density plot (heatmap)
den_we_plt <- plot_ly(z = ~den_we$z, type = "heatmap", colorscale = "Viridis", zauto = FALSE, zmax = max(den_we$z)) %>%
  layout(title = "Point Density Plot of Crime Incidents",
         xaxis = list(title = "Longitude"),
         yaxis = list(title = "Latitude"))

# Display the interactive point density plot
den_we_plt

The heatmap reveals the spatial distribution of crime incidents in Colchester, with darker regions indicating higher densities of reported crimes. By analyzing the heatmap, law enforcement agencies can identify areas with high crime rates, enabling them to allocate resources more effectively and implement targeted crime prevention strategies. Hotspots identified on the heatmap can serve as focal points for increased police patrols or community outreach programs aimed at reducing crime in specific neighborhoods. Understanding the geographic patterns of crime incidents can also help urban planners and policymakers make informed decisions about city development and resource allocation.

Exploring Violent Crime Incidents in Colchester: A Geographic Overview

# Define the path to the downloaded icon
ic_pathyy <- "/Users/nithyashree/Downloads/icons8-danger-48-2.png"

# Define a custom icon with popup
cs_ic <- makeIcon(
  iconUrl = ic_pathyy,  
  iconWidth = 12,       
  iconHeight = 12      
)

# Create Leaflet map for violent crimes
vc_mapco4 <- leaflet() %>%
  addTiles() %>%
  addMarkers(
    data = vc_co4,
    lng = ~long,
    lat = ~lat,
    icon = cs_ic,  # Use custom icon
    popup = ~paste("Category: ", category, "<br>Date: ", date)  
  ) %>%
  setView(lng = mean(vc_co4$long), lat = mean(vc_co4$lat), zoom = 12)  # Set initial view and zoom level
vc_mapco4 

This interactive map utilizes custom danger icons to pinpoint locations of violent crime incidents in Colchester. Each icon represents a specific incident, providing information about its category and date. By exploring the map, one can identify areas with a higher concentration of violent crimes and observe any spatial patterns or clusters. Additionally, by examining the dates associated with each incident, one can detect temporal trends in violent criminal activity across different locations in Colchester. This visualization helps in understanding the geographical distribution and temporal dynamics of violent crime incidents, Useful for informed decision-making for law enforcement and community safety initiatives

Future Work

The future work entails the development of advanced machine learning models that integrate historical crime data with meteorological information to predict the outcome status of criminal incidents, enabling prioritization of high-priority cases for investigation. These models will be seamlessly integrated with interactive data visualization tools, empowering stakeholders to derive actionable information from complex datasets and identify trends and potential hotspots in real time. Additionally, predictive algorithms will be developed to forecast future crime trends by analyzing historical patterns and weather variables, particularly focusing on temperature dynamics.

Conclusion

By exploring the outcomes, dates, seasons, and crime hotspots associated with violent crimes, we highlight their significance in informing proactive law enforcement strategies.The integration of machine learning models with data visualization techniques and predictive algorithms represents a transformative approach to addressing public safety challenges. By Utilizing the power of data-driven information and advanced analytics, law enforcement agencies can make informed decisions, prioritize resources effectively, and proactively mitigate risks to enhance community safety and well-being. As technology continues to evolve, ongoing research and innovation in this field will play a vital role in shaping the future of crime prevention and law enforcement practices

References

Kabacoff, R. (in press). Modern Data Visualization with R. Florida: CRC Press.

Rahlf, T. (2017). Data Visualization with R: 100 Examples. Springer.

Lowe, J., & Matthee, M. (2020). Requirements of data visualization tools to analyze big data: A structured literature review. In Responsible Design, Implementation and Use of Communication and Information

Sancho, J. L. V., Domínguez, J. C., & et al. (2014). An approach to the taxonomy of data visualization.

Yin, H. (2002). Data visualisation and manifold mapping using the ViSOM. Neural Networks, 15(8–9), 1005-1016. DOI: 10.1016/S0893-6080(02)00075-8.